PhraseFix: Statistical Post-Editing of TectoMT

نویسندگان

  • Petra Galuscáková
  • Martin Popel
  • Ondrej Bojar
چکیده

We present two English-to-Czech systems that took part in the WMT 2013 shared task: TECTOMT and PHRASEFIX. The former is a deep-syntactic transfer-based system, the latter is a more-or-less standard statistical post-editing (SPE) applied on top of TECTOMT. In a brief survey, we put SPE in context with other system combination techniques and evaluate SPE vs. another simple system combination technique: using synthetic parallel data from TECTOMT to train a statistical MT system (SMT). We confirm that PHRASEFIX (SPE) improves the output of TECTOMT, and we use this to analyze errors in TECTOMT. However, we also show that extending data for SMT is more effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TectoMT – a Deep-Linguistic Core of the Combined Chimera MT system

Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix postcorrection system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The developme...

متن کامل

The Significance of Peer-Editing in Teaching Writing to EFL Students

This study set out to investigate the effect of peer- editing as a metacognitive strategy on the development of writing. It was hypothesized that peer-editing could be used to raise grammatical and compositional awareness of the learners. Forty pre-intermediate sophomores at Islamic Azad University-Tabriz Branch participated in the study, taking the course Writing I. To warrant the initial homo...

متن کامل

What a Transfer-Based System Brings to the Combination with PBMT

We present a thorough analysis of a combination of a statistical and a transferbased system for English→Czech translation, Moses and TectoMT. We describe several techniques for inspecting such a system combination which are based both on automatic and manual evaluation. While TectoMT often produces bad translations, Moses is still able to select the good parts of them. In many cases, TectoMT pr...

متن کامل

English-Czech Machine Translation Using TectoMT

English to Czech machine translation as it is implemented in the TectoMT system consists of three phases: analysis, transfer and synthesis. The system uses tectogrammatical (deep-syntactic dependency) trees as the transfer medium. Each phase is divided into so-called blocks, which are processing units that solve linguistically interpretable tasks (e.g., statistical part-of-speech tagging or rul...

متن کامل

Moses & Treex Hybrid MT Systems Bestiary

Moses is a well-known representative of the phrase-based statistical machine translation systems family, which are known to be extremely poor in explicit linguistic knowledge, operating on flat language representations, consisting only of tokens and phrases. Treex, on the other hand, is a highly linguistically motivated NLP toolkit, operating on several layers of language representation, rich i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013